Motivation:

An analysis of almost any social media data can can be rather telling of how subgroups of a population interact with each other on a large scale. We are interested in the content of these interactions and how they vary throughout the United States over the few days that our data spans.

Related work: Anything that inspired you, such as a paper, a web site, or something we discussed in class.

Initial questions:

What questions are you trying to answer? How did these questions evolve over the course of the project? What new questions did you consider in the course of your analysis?

Data:

Source, scraping method, cleaning, etc.

Exploratory analysis:

Visualizations, summaries, and exploratory statistical analyses. Justify the steps you took, and show any major changes to your ideas.

Additional analysis:

If you undertake formal statistical analyses, describe these in detail

Discussion:

What were your findings? Are they what you expect? What insights into the data can you make?

##                 sentiment count
## anger               anger 13605
## anticipation anticipation 52960
## disgust           disgust 12668
## fear                 fear 19942
## joy                   joy 46690
## sadness           sadness 21882
## surprise         surprise 22067
## trust               trust 76347

From this graph, we noticed that we are missing some time intervals in our data set. We are not sure why this is. The website from which we obtained the data must not have scraped for these times.

##           hashtags  Freq
## 1              job 51511
## 2           hiring 45428
## 3             jobs 21910
## 4        careerarc 20717
## 5           retail  7454
## 6      hospitality  7311
## 7          nursing  5091
## 8       healthcare  4702
## 9         veterans  4471
## 10           sales  3310
## 11              it  2179
## 12 customerservice  1927
## 13  transportation  1568
## 14           sonic  1520
## 15   manufacturing  1476
## 16           photo  1432
## 17    businessmgmt  1348
## 18      accounting  1053
## 19     engineering   970
## 20         traffic   955

When mapping the positive scores for all tweets, we see that there is a moderate to low score through the US. At this scale, we cannot see a definitive trend at the state level. However, we do see that there are not a lot of tweets generated in the midwest or north west. There does seem that there are slightly more positive tweets from the middle of the country.

When mapping sentiment across all US, we see an overwhelming amount of “trust” tweets. We are not quite sure what this emotion means.

When we filter out trust, we see that surprise and joy seem to be commonly tweeted emotions.

Due to the fact that our location column displays differences in specificity, we built a function that took the latitude and longitude of each tweet and converted it to the state in which the tweet originated from. We then proceeded to add that to our original dataset.

To evaluate overall sentiment by state, we selected the appropriate columns, then grouped and summed by state, making sure to discount missing locations. Maine, Alaska and Hawaii were not included in this survey, however the 48 state count comes from Virginia and the District of Columbia recieving individual designations.

The following heatmap shows the level of positive and negative sentiment across the United States during the 48 hour period of our dataset. Maine, Alaska and Hawaii are blacked out as tweets from those states were not recorded.

We can observe with these two maps that states like California and Texas are consistently the highest ranked, which can be assumed to be population related. It is interesting because the state with the lowest positive and negative sentiment scores is Washington. This could be for two reasons: population difference or that twweets have less sentimental words than other states and therefore don’t generate as strong sentiment scores.